Introduction
Humans observers have a remarkable ability to identify thousands of different things in the world, including people, animals, artifacts, structures, and places. Many of the things we typically encounter are objects – compact entities that have a distinct shape and a contour that allows them to be easily separated from their visual surroundings. Examples include faces, blenders, automobiles, and shoes. Studies of visual recognition have traditionally focused on object recognition; for example, investigations of the neural basis of object and face coding in the ventral visual stream are plentiful (Tanaka, 1993; Tsao and Livingstone, 2008; Yamane et al., 2008).
Some recognition tasks, however, involve analysis of the entire scene rather than just individual objects. Consider, for example, the situation where one walks into a room and needs to determine whether it is a kitchen or a study. Although one might perform this task by first identifying the objects in the scene and then deducing the identity of the surroundings from this list, this would be a relatively laborious process, which does not fit with our intuition (and behavioral data) that we can identify the scene quite rapidly. Consider as well the challenge of identifying one's location during a walk around a city or a college campus, or through a natural wooded environment. Although we can perform this task by identifying distinct object-like landmarks (buildings, statues, trees, etc.), we also seem to have some ability to identify places based on their overall visual appearance.